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ABSTRACT 

me  linearized  nonlinear  regression  method  (Walsh  [l%3]}  has  sub¬ 
stantial  curve-fitting  flexibility.  It  also  permits  isolation  and  prob¬ 
abilistic  investigation  of  pertinent  effects,  and  can  be  used  for  developing 
paired  comparisons  for  persons,  animals,  or  plants  ("items"  of  a  given 
type).  Here,  an  item  is  identified  by  the  values  for  specified  character¬ 
istics,  and  two  kinds  of  "treatment"  (e.g.,  exposure  and  nonexposure  to 
radiation)  are  compared  with  respect  to  observed  values  for  a  given  character¬ 
istic.  Ideally,  two  items  being  compared  for  treatment  effect  should  be 
the  same  with  regard  to  the  other  characteristics  (those  not  used  for 
comparison).  That  is,  they  should  have  the  same  set  of  values  for  these 
other  characteristics.  This  ideal  situation  seldom  occurs.  However, 
by  suitable  use  of  the  linearized  nonlinear  regression  model,  composite 
items  can  be  constructed  (for  given  treatment)  that  are  the  same  with 
respect  to  the  other  characteristics.  Modified  paired  comparisons  are 
obtained  on  the  basis  of  these  composite  items.  The  probability  properties 
of  modified  paired  comparisons  can  be  very  heterogeneous,  so  that  special 
concepts  and  statistical  techniques  are  needed.  Two  approaches  for 
development  of  tests  and  confidence  intervals  are  given.  Some  applications 
involving  exposure  to  radiation  are  discussed. 

•Research  partially  supported  by  NASA  Grant  NGR  44-007-020,  Also  associated 
with  ONR  Contract  N0Q014-68-A-0515. 
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1.  I  INTRODUCTION 

This  paper  is  concerned  with  development  of  generally  applicable 
methods  for  developing  paired  comparisons  for  "items"  (persons,  animals, 
or  plants)  of  a  given  type.  The  comparison  is  with  respect  to  the  influence 
of  two  kinds  of  "treatment"  on  a  specified  univariate  characteristic  of 
the  items.  For  example,  the  items  might  be  people,  the  treatment  could 
be  exposure  and  nonexposure  to  nuclear  radiation,  and  the  characteristic 
investigated  might  be  an  index  of  premature  aging. 

Other  characteristics  (physical  and  environmental)  besides  the  one 
used  for  comparison  can  be  needed  to  satisfactorily  identify  an  item. 

The  items  receiving  one  kind  of  treatment  usually  differ  from  nearly  all 
of  those  receiving  the  other  kind  of  treatment  with  respect  to  the  com¬ 
bination  of  values  for  these  other  characteristics.  Thus,  determination 
of  whether  differences  between  observed  values  (of  the  comparison  character¬ 
istic)  are  actually  due  to  different  treatment  is  difficult.  That  is, 
the  disagreement  could  be  due  to  different  combination  of  values  for  the 
other  characteristics,  rather  than  different  treatment.  This  is  further 
complicated  by  the  presence  of  statistical  variation.  Some  approach  that 
investigates  treatment  effects  in  a  probabilistic  manner  (accounting  for 
statistical  variation)  is  needed. 

Often,  specialized  information  about  the  probabilistic  properties 
of  the  observed  characteristic  for  comparison  is  lacking.  Then,  the 
statistical  method  needs  to  be  of  a  qenr.t.;  nature.  The  linearized 
nonlinear  regression  model  that  is  outlined  in  section  3  (see  Walsh  [ 1963 j 
for  a  more  detailed  statement)  seems  to  provide  a  satisfactory  basis 
for  many  situations.  For  example,  a  modification  of  comparisons  involving 
pairs  of  items  can  be  developed  (dii  “rent  treatments  for  the  items  of 
a  pair).  Instead  of  using  actual  items,  two  composite  items  are  constructed 
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for  each  pair  (with  respect  to  observed  values  of  the  characteristic  for 
comparison).  The  composite  items  of  a  pair  are  constructed  to  be  the 
same  with  respect  to  the  combination  of  values  for  the  other  character¬ 
istics.  Thus,  if  the  model  is  capable  of  approximating  the  true  situation, 
the  difference  in  observed  values  for  a  modified  pair  can  be  attri’  ted 
(approximately)  to  treatment  difference.  Then,  by  development  of  suitable 
statistical  techniques  for  analysis  of  modified  paired  comparison  data, 
treatment,  treatment  difference  can  be  investigated. 

Section  2  contains  a  general  discussion  of  the  regression  approach. 

This  is  followed  by  an  outline  of  the  linearized  nonlinear  regression 
model.  The  associated  probability  model  is  outlined  in  section  4.  Two 
ways  of  constructing  composite  items  are  given  in  section  5.  Two  analysis 
methods  for  modified  paired  comparisons  are  outlined  in  section  6.  Finally, 
some  potential  uses  of  these  results,  in  pa  icular  to  investigating 
exposure  to  radiation,  are  discussed. 

2.  GENERAL  DISCUSSION  REGRESSION 

Each  item  can  be  represented  by  a  "vector”  (multidimensional  point). 

The  first  coordinate  of  this  vector  corresponds  to  tne  characteristic 
being  investigated  while  the  other  coordinates  correspond  to  the  physical 
and  environmental  characteristics  that  are  believed  *o  have  an  important 
influence  on  the  probability  distribution  of  the  first  coordinate.  Each 
characteristic  has  a  set  of  possible  values  (or  levels)  and  the  value  of 
the  vector  for  an  item  is  obtained  by  giving  each  coordinate  the  value 
of  the  corresponding  characteristic.  Notationally ,  a  vector  is  of  the 

form  (y;Xj  .  x^),  where  y  corresponds  to  the  characteristic  being 

investigated  and  the  x's  correspond  to  the  k  other  characteristics. 
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Let  y.  ,  x..  x..  be  the  values  of  these  characteristics  that  occur 

1  i.  X  K 1 

for  the  i ~t h  item  of  the  n  items  for  wi;ich  data  are  available.  Then,  the 
i-th  item  is  represented  by  (y.;  x^  x^J. 

For  the  situations  considered,  the  values  of  the  x’s  are  assumed  to 
be  fixed  quantities  and  the  value  of  y  is  considered  to  be  a  random  variable. 
A  regression  approach  is  convenient  for  investigating  y  in  such  a  manner 
that  the  influence  of  the  x's  receives  explicit  consideration.  Stated 
in  a  general  manner,  y  is  expressed  in  the  form 


y  ~  h(xl  .  xk;Al  . At)  +  e'  * 

where  the  function  h  is  completely  specified  except  for  the  values  of  the 
"regression  coefficients"  ,...,  At  and  e'  is  a  random  variable,  A  "tru#" 
set  of  values  exists  for  the  A's,  but  these  values  are  virtually  always 
unknown.  Thus,  the  A's  occur  as  unknown  parameters.  Ideally,  h  should 
be  chosen  so  that  the  general  magnitude  of  e'  is  as  small  as  possible.  The 


function  h(x^  .  xk'Al  At^  *s  calle<*  the  regression  function  of 

y  on  Xj  x^.  Using  regression  terminology,  y  is  the  dependent  variable 


and  Xj  ,...,  x^  are  the  independent  variables  for  the  regression. 

Using  this  representation,  the  observed  value  of  y  for  the  i-th  item 
is  expressed  in  the  form 


y .  =  h(x, 

J  i  li 


x.  .  ;A, 
ki  1 


A  )  +  e . 
t  i 


It  is  often  reasonable  to  assume  that  the  e.  are  statistically  independent. 
However,  for  heterogeneous  situations,  this  is  one  of  the  few  assumptions 
that  is  acceptable.  There  is  seldom  much  justification  for  the  assumptions 
that  the  e.  have  zero  expected  values,  that  they  are  sample  values  from 
the  same  population,  that  they  have  the  same  variance,  etc. 
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The  lack  of  specialized  information  about  the  probability  properties 
of  the  e^  is  one  of  the  fundamental  difficulties  encountered  in  analyzing 
heterogeneous  data.  Other  difficulties  occur  with  respect  to  selection  of 
the  function  h.  If  an  elementary  form,  such  as  linear,  is  used  for  h, 
isolation  of  effects  due  to  the  independent  variables  is  rather  easily 
accomplished.  For  example,  composite  items  for  paired  comparisons  can 
be  constructed  without  substantial  computational  effort.  On  the  other 
hand,  use  of  an  unsuitable  form  for  h  can  result  in  poor  agreement  between 

and  h(x^  .  xk; '^1  *••••  over  tl,e  vaiues  of  i.  In  many  cases, 

the  failure  of  a  regression  function  to  provide  a  good  fit  to  the  data  may 
be  largely  due  to  the  restricted  curve-fitting  capability  of  the  func¬ 
tional  form  that  is  used  for  h.  A  basic  problem  in  obtaining  regression 
functions  that  are  suitable  for  heterogeneous  data  is  to  greatly  increase 
the  curve-fitting  capability  of  the  regression  function  without  substantial 
changes  in  the  desirable  manipulation-computation  properties  that  occur 
for  elementary  forms  such  as  linear.  The  linearized  nonlinear  regression 
model  seems  to  furnish  a  satisfactory  solution  to  this  problem. 

3.  LINEARIZED  NONLINEAR  REGRESSION  MODEL 
Let  the  range  of  possible  values  for  y  be  y^  s  y  ■  The  approach 

for  linearized  nonlinear  regression  cons'sts  in  expressing  the  regression 
function  in  a  transformed  manner.  Specifically,  let  g^ty)  .....  g s t. y )  be 

specified  functions  of  y  while  flJ+.2fxi  .  xk^  ••••>  9t^xj  .  xk'*  are 

specified  functions  of  Xj  .  x^.  Then,  y  is  implicitly  expressed  in 


the  form 


(i)  y+A1fl1(y)  +  ..  +  Asas(y) 

=  As+1  +  As+2«st2(xl 


xk)  +■  ...  +  A^Cxj  .  xk^+e" 


where  ,...,  A$  are  such  that  the  lefthand  side  of  (1)  is  a  monotonic 

function  of  y  for  s.  y  s.  >y  Let.  hCx^  .  x^ ;  A^  . A^)  be  the 

solution  of  (1)  for  y  when  e"  =  0.  Then,  expression  (1)  is  equivalent  to 
an  expression  cf  the  form 


y  *  h(.Xj  .  x^;  A^  ,...,  A^)  +  e'  , 

where  the  function  h  can  nave  a  substantial  amount  of  curve-fitting 
capability.  The  form  of  Cl)  allows  linear  manipulations  to  be  used  in 
isolating  effects  that  are  expressed  in  terms  of  the  A's.  Since  y 
is  a  monotonic  function  of 


Asrl  ~  As+2°s+2(xi  .  V  +  "  At°tUl  .  V  ’ 

when  statistical  variation  is  neglected,  isolation  of  specified  linear 
combinations  of  Aj+^  A^  is  of  special  interest.  Although  the 

restriction  imposed  on  .  A^  places  some  limitation  on  the  use 

of  linear  manipulations,  the  computational  aspects  of  the  linearized  non¬ 
linear  model  seem  to  be  at  a  manageable  level. 

The  primary  purpose  of  y^y)  .  g  Cy)  is  to  furnish  sufficient 

curve-fitting  capability.  If  y ^  and  v„  are  finite,  this  can  often  be 

2  A 

accomplished  by  letting  1=2.  g^y)  r  >  ,  and  g2iy)  =  y  .  The  choice  of 
2  3  4 

*  -  3,  gjy )  -  y  .  g^Cy)  =  y  ,  and  gyyl  =  y  is  available  if  greater 
curve-fitting  capability  is  desired.  It  is  anticipated  that  finite  values 
can  be  used  for  y ^  and  y^.  in  nearly  all  cases. 
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.  ASSXIATED  PROBABILITY  MODEL 
The  dependent  variable  y .  for  the  i-th  item  is  considered  to  have 

a  probability  distribution  but  the  independent  variables  x^.  .  x^. 

are  considered  to  be  fixed.  The  y.  are  assumed  to  be  statistically  in¬ 
dependent  (i.e.,  the  e.  are  independent)  but  the  shape  of  the  distribution 
for  y.  does  not  necessarily  have  any  relation  to  the  shape  of  the  dis¬ 
tribution  of  y^  i  f  i  t  j . 

The  key  feature  of  the  probability  model  is  the  procedure  used  to 

define  what  the  parameters  A^  .  A^  represent.  These  definitions  are 

intuitively  meaningful  and  also  allow  useful  probability  results  to  be 
developed  for  heterogeneous  situations.  This  procedure  is  a  generalization 
of  the  median  estimation  concept. 

Suppose  that  the  total  number  n  of  items  is  not  too  small.  By  some 
data  manipulations  (see  Walsh  [l%3j),  a  few  "observations"  Ylu.v)  can  be 
constructed  that  are  independent,  approximately  continuous,  and  of  the  form 

»Y  u  ;  v )  =  A  t  e  ( u  ;  v ;  , 
v 

where  e(u;v)  is  a  random  variable.  Let  pfu;v)  be  the  (.unknown)  value 

of  PiY(u;v)  A  j.  Then,  the  ^unknown)  value  of  A  is  defined  by  the 

requirement  that  the  arithmetic  average  of  the  ptu;v)  over  u  is  equal 

to  4-  Now,  by  the  methods  given  in  Walsh  ,1^)3^,  an  approximate  med'an 

estimate  and  approximate  confidence  intervals  can  be  obtained  for  the 

true  but  unknown  value  of  A  tv  =  1  ,...,  t). 

v 

5,  CONSTRUCTION  OF  COMPOSITE  ITQtS 
In  a  general  let  us  consider  a  couple  of  procedures  that  might 

be  used  to  construct  the  composite  items  for  a  pair  that  is  used  in  a 
modified  paired  comparisons  investigation  of  whether  treatment  effects 
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are  statistically  significant.  Under  the  null  hypothesis,  treatment 
has  no  effect  and  the  same  linearized  nonlinear  regression  model  can  be 
used  for  all  items. 

First,  the  values  of  x^  that  are  to  occur  for  this  pair 

are  specified.  Then,  for  a  given  treatment,  one  probl ?m  is  to  construct 

an  item  with  these  values  for  the  x's.  That  is,  a  suitable  subset  of 

the  data  with  this  treatment  is  manipulated  to  determine  an  observed 
value  of  y  that,  according  to  the  linearized  nonlinear  regression  model, 
corresponds  to  an  item  with  these  values  for  the  x's.  This  value  is 

used  as  if  it  were  observed  for  an  item  with  the  given  treatment  with 

these  x's.  The  same  procedure  is  used  to  determine  the  observed  value 
for  an  item  with  these  values  for  the  x's  and  the  other  kind  of  treat¬ 
ment.  These  two  y  values  then  constitute  the  observations  for  this  pair. 
Here,  nonoverlapping  subsets  of  data  are  used  for  the  various  pairs  that 
are  developed.  Thus,  a  disadvantage  of  the  construction  of  composite  items 
is  that  the  data  foi  a  number  of  items  is  needed  in  order  to  construct 
one  composite  item. 

One  way  of  determining  an  observed  value  for  y  is  to  use  a  suitable 
subset  of  the  data  to  individually  estimate  A^  and  to  estimate 

the  value  of 

Vl  *  As*2«s.2Ul  .  V  *  •••  *  WX1  .  V- 

Using  these  estimates  as  if  they  were  the  true  values,  solution  of 
v  +  A,g,tv)  *-  ...  >-  A  q  tv) 

=  Vi  *  Asr2V2lxl  .  V  +  ••  •  '  AtVxl  .  xk 

determines  a  corresponding  observed  value  for  y. 
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A  second  way  of  determing  an  observed  value  for  y  is  to  linearly 
combine  the  data  on  t  +  1  items  so  that  the  weighted  sum  of  the  in¬ 
dividual  regression  expressions  is  of  the  form  that  occurs  when  the 
specified  values  x,  x.  occur.  That  is.  weights  w.  (not  necessarily 

positive)  are  determined  so  that 


£ 


wi [y i  +  AlgiCyi)  +  •••  +  As8s(yi)] 


t+1 

ZVi  +  Vi 

i=i 


-t- 


and 


t+1 

£ 

i-1 


=  1, 


t+1 

£ 

i=l 


VrUli 


xk i -1  =  ar<xl  .  xk)' 


( r  =  S  +  2  t). 

t+  1 

Then,  ^ w ^ y ^  is  the  observed  value  for  y. 
i=l 

6,  ANALYSIS  OF  MODIFIED  PAIRED  COMPARISONS 
Having  constructed  the  modified  paired  comparisons,  the  difference 
of  the  values  for  each  pair  can  be  formed.  More  generally,  each  value 
of  a  pair  can  be  transformed  by  use  of  the  same  function  and  the  difference 
of  the  two  transformed  values  can  be  formed.  These  d’ Fferences  can  be 
used  to  ccnpare  the  two  treatments  (with  respect  to  the  specified  charac¬ 
teristic). 


Due  to  the  rather  general  ways  in  which  the  modified  pairs  are  con¬ 
structed,  special  statistical  procedures  will  be  needed  for  the  analysis 
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of  the  differences.  Even  though  the  composite  items  of  a  pair  have 
the  same  values  for  x^  ,  the  observed  values  for  y  do  not 

necessarily  have  the  same  distribution  under  the  null  hypothesis.  Hence, 
a  difference  is  not  necessarily  symmetrically  distributed  about  zero 
unaer  the  null  hypothesis.  Also,  the  distribution  for  one  difference 
may  be  greatly  different  frcm  that  for  another  difference.  In  spite 
of  these  difficulties,  development  of  statistical  techniques  that  are 
satisfactory  for  analyzing  these  independent  differences  is  possible. 

One  method,  which  seems  especially  suitable  when  the  second  way 
is  used  to  construct  composite  items,  is  to  investigate  the  value  of 
a  special  parameter  which  is  defined  as  follows: 

For  each  difference,  consider  the  (unknown)  probability 
that  the  difference  is  less  than  the  parameter.  The  (unknown) 
value  of  the  parameter  is  determined  by  the  requirement  that 
the  average  of  these  probabilities  is  An  approximate 
median  estimate  and  approximate  confidence  intervals  can  be 
developed  for  the  value  of  this  parameter.  In  particular, 
procedures  can  be  developed  for  testing  whether  this  parameter 
is  zero,  which  seems  to  be  a  reasonable  choice  for  its  null 
value  (in  most  cases). 

Another  method  is  based  on  Walsh  [ 1951 3  (also  see  Walsh  [ 1962] ) . 

Here,  a  suitably  chosen  function  of  the  medians  of  the  distributions 
for  the  differences  is  investigated  (for  example,  the  arithmetic  average 
of  these  population  medians).  By  suitable  rise  of  the  material  in 
Walsh  [l95l],  approximate  equal-tail  confidence  intervals  can  be  ob¬ 
tained  for  the  function  of  medians  that  is  investigated.  These  con¬ 
fidence  intervals  yield  two-sided  significance  tests  of  the  null  value 
for  the  function  of  the  medians.  Usually,  zero  seems  to  be  a  suitable 
choice  for  this  nuli  value. 
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7.  DISCUSSION  OF  USES 

The  material  of  this  paper  is  oriented  toward  situations  where  the 
items  can  be  of  a  very  heterogeneous  nature.  Consequently,  it  is 
applicable  to  many  biological  and  medical  areas.  However,  the  wide 
applicability  of  the  results  indicates  that  they  are  most  appropriate 
when  a  large  amount  of  data  is  available.  That  is,  the  approach  is 
somewhat  coarse  and  mainly  useful  when  a  large  number  of  items  are 
available  for  the  investigation.  Thus,  this  method  is  especially 
suitable  for  retrospective  studies  where  information  has  oeen  obtained 
on  a  very  large  number  of  items. 

One  application  area,  which  motivated  development  of  these  results, 
is  investigation  of  exposure  of  persons  to  nuclear  radiation.  This 
method  should  be  satisfactory  for  analyzing  data  collected  for  persons 
exposed  to  radiation  in  the  atomic  bomb  attacks  in  Hiroshima  and  K^aasaki 
during  World  War  II.  It  should  also  be  somewhat  suitable  lor  analyzing 
the  data  from  the  accidental  radiation  of  the  Rongelap  people  fe.g.,  see 
Conard  et  al  [1963]). 
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13  ABSTRACT 


The  linearized  nonlinear  regression  method  (Wal sh»f ! %3 }>  ha«  sub¬ 
stantial  curve-fitting  flexibility.  Tt  also  permits  isolation  and  prob¬ 
abilistic  investigation  of  pertinent  effects,  and  can  be  used  for  developing 
paired  comparisons  for  persons,  animals,  or  plants  ("items"  of  a  given 
type).  Here,  an  item  is  identified  by  the  vali,os  for  specified  character¬ 
istics,  and  two  kinds  of  "treatment"  (e.g. ,  exposure  and  nonexposure  to 
radiation)  are  compared  with  respect  to  observed  values  for  a  given  cnaracter- 
istic.  Ideally,  two  items  being  compared  for  treatment  effect  should  be 
the  same  with  regard  to  the  other  characteristics  (those  not  used  for 
comparison).  That  is,  they  should  have  the  same  set  of  values  for  these 
other  characteristics.  This  ideal  si  iation  seldom  occurs.  However, 
by  suitable  use  of  the  linearized  nonlinear  regression  model,  composite 
items  can  be  constructed  (for  given  treatment)  that  are  the  same  with 
respect  to  the  other  characteristics.  Modified  paired  comparisons  are 
obtained  on  the  basis  of  these  composite  items.  The  probability  properties 
ol  modified  paired  comparisons  can  be  very  heterogeneous,  so  that  special 
concepts  and  statistical  techniques  are  needed.  Two  approaches  for 
development  of  tests  and  confidence  intervals  are  given.  Some  applications 
involving  exposure  to  radiation  are  discussed. 

*  John  E.  Walsh  [1963],  "Use  of  linearized  nonlinear  regression  for 

simulations  involving  Monte  Carlo, "Operations  Research,  Vol.  11,  pp.  228-235. 


